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The goal of this paper is to describe the application of quasi- 
likelihood estimating equations for spatially correlated binary data. 
In this paper, a logistic function is used to model the marginal prob- 
ability of binary responses in terms of parameters of interest. With 
mild assumptions on the correlations, the Leonov-Shiryaev formula 
combined with a comparison of characteristic functions can be used 
to establish asymptotic normality for linear combinations of the bi- 
nary responses. The consistency and asymptotic normality for quasi- 
likelihood estimates can then be derived. By modeling spatial cor- 
relation with a variogram, we apply these asymptotic results to test 
independence of two spatially correlated binary outcomes and illus- 
trate the concepts with a well-known example based on data from 
Lansing Woods. The comparison of generalized estimating equations 
and the proposed approach is also discussed. 

1. Introduction. This paper was originally motivated by a question aris- 
ing in forest ecology. Specifically, for each of a number of trees in a plantation, 
it is possible to determine the status of the tree (alive or dead) and whether 
a particular insect pest is present on the tree. The ecological question of 
interest is whether tree status is independent of the presence/absence of the 
pest. Due to biotic interaction such as mutualism or competition, observa- 
tions recorded at locations usually exhibit spatial autocorrelation and this 
renders the traditional methods unsuitable. The question then becomes: how 
do we test for independence of two binary variables in the presence of spatial 
autocorrelation? 

To abstract the problem, suppose that we have observations Y = (Y"(si), . . . , Y(sjv)) j 
that are binary responses drawn from a random field on an m x n grid with 
N = mn, and where Sj S R 2 denotes the location of the ith binary variable. 
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The mean response 6 = E(Y) is assumed to be associated with the mea- 
surements of explanatory variables T through a link function h{6) = T(3. 
The problem of detecting association then turns to the estimation of the 
parameters @. 

There is relatively little literature on the estimation of fixed effects (3 
when observations are binary and spatially dependent. Albert and McShane 
[2] and Gotway and Stroup [11] mentioned the use of generalized estimating 
equations (GEE) in this setting. However, the GEE approach has been most 
fully developed for longitudinal data [15]. These methods do not directly 
apply to spatial data, and therefore there is a need to develop an estimat- 
ing equation approach corresponding to the covariance structures typically 
found in spatial statistics, which are neither diagonal nor block diagonal. 

To proceed, we will rely on quasi-likelihood (QL) ideas. The concept 
of quasi-likelihood functions was first introduced by Wedderburn [25] for 
observations from the exponential family of distributions with only mean 
and variance being specified. McCullagh (year?) and McCullagh and Nelder 
(year?) broadened the application to multivariate cases and discuss asymp- 
totic properties. A potential obstacle in applying QL methods is that most 
of the developed methods were mainly based on the assumption of inde- 
pendent observations. Although QL functions can be defined for dependent 
data, McCullagh and Nelder [20] raised some concerns about the application 
to dependent data. 

There are some examples in the literature of the application of QL es- 
timating equations to correlated data, although most of these are focused 
on count data. Zeger [26] developed QL estimating equations for time series 
count data for a specific type of covariance function, and this approach was 
also applied by McShane, Albert and Palmatier [21] to spatial data. Papers 
more closely related to the current one are those of Heagerty and Lumley 
[13] and Lumley and Heagerty [17]. They developed a useful approach for 
the variance estimation of spatially correlated count data by combining the 
concept of window subsampling and the QL estimating function. (We discuss 
the contrast between their approach and ours below.) 

We next review the concept of QL functions preliminary to our extension 
of QL functions to spatially dependent data. The definition of QL functions 
used in this paper is the same as that of McCullagh and Nelder ([20], page 
327). On the assumption that the covariance matrix cov(Y) = ?/> 2 V is pro- 
portional to some function of the mean, the quasi-score function is given 
by 

(1.1) U(/3; Y) =P T V- 1 [Y - 0(/3)]/V 2 . 

Here ip 2 is a constant independent of 6 and P is the derivative matrix 
dO/d/3. The quasi-score function (1.1) is still relevant when observations are 
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dependent, although some restrictions must be imposed on the covariance 
matrix V in order that some scalar functions whose gradient vector is equal 
to U(/3) exist ([20], page 333). At the observed Y = y, the QL estimate (3 
is the solution of a QL estimating equation U(/3; Y = y) = 0. For nonlin- 
ear models, a closed-form expression for f3 is usually impossible to obtain. 
McCullagh [18] suggested investigating statistical properties of (3 by consid- 
ering (3 as a root of the quasi-score function within a neighborhood of the 
true parameter point. 

In the next section we follow this idea to establish consistency and asymp- 
totic normality of QL estimates. Some assumptions are made such that the 
central limit theory for correlated response variables Y holds. An example is 
used to illustrate how QL estimating equations can test dependence between 
cross-classified response variables in Section 3. 

2. Asymptotic normality of quasi-likelihood estimates. In this paper we 
focus on a logistic link function: logit(0) = T/3, where (3 = (/?o, • • • , (3u) T and 
the ith row of T is tj = (1, tn, . . . ,ti U ) with tij = or 1. The individual 
parameters 9i = E(Y(si)\ti) are thus given by exp(tf (3) / (1 + exp(tf (3)) for 
i = 1, . . . , N, and the (ij)th component of P of (1.1) is 



(2-1) (P)i 



(Oiil-Oi), ifj = l, 

\Uj-i0i(l-Bi), if j = 2,...,u + l. 

Let 7ij = corr(y(sj), Y(sj)) denote the correlation between response vari- 
ables at sites Sj and Sj. In this paper Y is assumed to be an isotropic process, 
so that jij depends only on the distance between Sj and Sj and is independent 
of 6. For QL estimates to exist, the covariance structure must follow certain 
restrictions [26]. In our covariance matrix, the (ij)th entry of its inverse can 
be represented by V^ 1 = (0"j<7.,-7jj) _1 , where <Tj = \f9iil — 9i). Thus V"- 1 is 
independent of 9^ and dV^ /dO^, d\~f}/d9j, dV^^/d9i are equal to zero. 
This satisfies the condition for existence provided by McCullagh and Nelder 
([20], page 334). 

To ensure that the central limit theory holds, some restrictions should 
be imposed on 7^. One typical assumption is that the observations should 
satisfy long-range independence [6]. With this assumption, the random vari- 
ables can be divided into blocks and treated as independent. The classical 
central limit theorem then leads to asymptotic normality for the sum of 
random variables immediately. 

A condition more general than long-range independence is strong mixing 
[24]. For a stationary random field satisfying appropriate mixing conditions, 
Bolthausen [4] showed that the central limit theorem holds for grided data 
011 Z k . This version of a central limit theorem has been cited several times 
in the literature of spatial statistics [12, 13, 22]. Nevertheless, one of the 



4 



P.-S. LIN AND M. K. CLAYTON 



assumptions for this central limit theorem is that the random fields are 
described with respect to an norm ([4], page 1047, and [22], page 56). 

In contrast, in this paper we show the existence of asymptotic normality 
with respect to any L p norm by requiring correlations to decrease exponen- 
tially with distance. This allows us to extend the QL estimating function to 
the L\ or Euclidean distance, which are the metrics most commonly used 
in spatial data. (Although we do not pursue the idea here, we conjecture 
that Assumption 2.1 could be adapted to more general conditions, includ- 
ing mixing conditions. Doing so would permit a tighter linkage between our 
results and those of the above-cited authors.) 

Assumption 2.1. jij = ap d ( Si ' Sj \ where a is a positive constant such 
that 0<ap< 1, pE [0,1]. d(si 1 ,Si 2 ) denotes the L p distance between Sj and 

Sj. 

Let Cfe(si, . . . , Sfc) and m&(si, . . . , s^) denote the cumulant and product 
moment functions of order k, respectively, for centered variables Z = Y — 0. 
The Leonov-Shiryaev formula ([14], page 21) leads to 

m k (s h , ... ,SjJ - ^2 m 2 (s UJl ,s (Aj2 ) x • • • x m 2 {s UJk _ 1 ,s LlJk ) = c k (si, . . . ,s k ) 

.y.1, . . . ,fc 

fe/2 

for even k, where T^'' denotes the collection of all possible sets whose 

elements are | disjoint pairs from {1, ...,&}. For odd k, mfe(si, . . . , s k ) = 
Cfc(si, . . . ,Sfc). We now make some assumptions on c k (si, . . . ,s k ). 

Assumption 2.2. Some a k < | exist such that Ylf vl t-M k Cfc(si 15 . . . ,s ife ) = 
0(N ak ). 

There are a number of processes for which this assumption holds, in- 
cluding examples such as independent and m-dependent processes ([5], page 
20). In a separate paper we hope to characterize more thoroughly the set 
of processes satisfying Assumption 2.2. We use Assumptions 2.1 and 2.2 to 
establish Theorem 2.1, whose proof is outlined in the Appendix. 

Theorem 2.1. For binary variables Y satisfying Assumptions 2.1 and 
2.2, ^a'(Y - 0) ~ 7V(0, ^a'Va) + Op(^=) for any bounded vector a. 

From a geometrical perspective, the matrix P r V _1 in the QL estimating 
equations represents a projection matrix of the residual vector y — 9 onto 
the space spanned by the columns of T. Before showing consistency of the 
QL estimates, we first study some properties of this matrix. 
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Notation. For matrices A and B, the symbol A y B means that, for 
any % and j, the (ij)th entry of A — B is nonnegative. 

To simplify matters, we assume that the mxn lattice of locations s, is la- 
beled columnwise, so that the first column of locations is labeled si, S2, . . . ,s m , 
the second column is labeled s m+ i, s m+ 2, . . . , S2 m , and so on. Also, in the fol- 
lowing proofs, the notation T a (p; L p ) is used to indicate that the correlation 
matrix depends on a, p and the L p distance, a is assumed to lie within (0, 1] 
because if a = 0, T is a zero matrix and there is nothing to discuss. 

Lemma 2.1. max{|(r~ 1 (/j; = 1, . . . , N} involves only a and 

P- 

Proof. First we study the correlation matrix T a (p;Li) for a = 1 and 
then extend the result to other a G (0, 1). Let Cl n denote an n x n matrix 
with (ft n )ij = 1 if i = j and p^~^ otherwise. Then, T\(p;Li) = fi n ® f2, n , 
where <8> is the Kronecker product. It is possible to show ([23], page 255) 
that rj' 1 (yo;Li) = n~ 1 (g)n~ 1 , where 



(2.2) n- A = (i-/) 



2\-l 



(I -p ••• 0\ 
-p 1 + p 2 -p ••• 



V ... - p ij nxn 



Therefore, the maximum absolute value in r^~ 1 (p;Li), say p* , only involves 
a and p. 

For a general matrix T a (p), a£ (0,1), first notice that T a (p;Li) is a 
positive-definite matrix. Since Ti(p;Li) ^T a (p;Li) y aT±(p; L\), applying 
the strong partial ordering of positive-definite matrices ([20], page 335) gives 

(2.3) ItiHp;Li) t r-^p^Obr^C^Li). 

Thus it follows that the maximum value in T~ 1 (p;Li) is bounded between 
p* j a and p* , and these only involve a and p. □ 

Lemma 2.2. With correlations under the L± metric, max{| (P T V _1 )jj| : i,j 
1,...,N} < Co for some constant Co independent of N. Moreover, some 
constant C\ exists such that limsup^f^^ jj (P t V~ 1 P)m < C\ for all k,l = 
1,...,N. 

Proof. For convenience in the proof, let V a denote the covariance ma- 
trix corresponding to T a (p;Li). To evaluate the first row of P T Vf 1 , note 
from (2.2) that each row or column of fi" 1 has at most three nonzero entries. 
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The Kronecker product then implies that each column of T 1 1 ( / o;Li) has at 
most nine nonzero entries. So, simple algebra gives |(P'Vj" < 9p*9*, 

where p* is given in the proof of Lemma 2.1 and 9* = max{ ^ jjj (1-0^) : hk = 

1,...,N}. This 9* is independent of ./V because at most 2 U possible values 
of 9 exist due to 9k only involving f3 and the binary vector t^. Prom (2.1), 
the distinction between the first and other rows of P^V^ 1 is the multiplier 
tij. It then follows that KP^VJ -1 )^! < ((P^V^ 1 )^! for all i and j because 
tij = or 1. This implies that 9p*9* is an upper bound for the absolute 
values in P^V]" 1 . 

Next, it is easy to see that |(P T Vf 1 P) M | < ££ =1 \(P T V7 1 ) km (P T ) ml \ < 
jNp*9* for all k and I because all entries of P in (2.1) are not larger than 
\. Consequently, limsupjv^^ ■^(P T Vj~ 1 P)/ c / < C for some constant C. 

Finally, to generalize the above to V a , a £ (0, 1), note that all entries in 
P and E^ 1 / 2 are nonnegative. Thus (2.3) implies that 

(2.4) ^ pTy ^ 1 - pTv_1 - p^r 1 

and 

(2.5) -P T V7 1 pyp T v~ 1 pyp T V7 1 'P. 

a 

Applying the results for Vi to (2.4) and (2.5) gives the desired result. □ 

The results of Lemma 2.2 can now be extended to any L p space, 1 < p < 
oo. The following theorem provides this general result. 

Theorem 2.2. Lemma 2.2 holds for the L p metric, 1 <p < oo. 

PROOF. By Minkowski's inequality, we know that the L\ metric is greater 
than the other L p metrics, 1 < p < oo. Since p is between and 1, we have 
r a (p; L p ) y T a (p; L\) and thus the strong partial ordering of positive-definite 
matrices implies (T a (p; L\))~ l y (T a (p; Lp)) -1 . Lemma 2.2 and an argument 
similar to the discussion of (2.4) and (2.5) give the desired result. □ 

The previous theorems show that P^V" 1 is a bounded vector. So, by 
Theorem 2.1, U(/3) is asymptotically normal. We now focus on the L\ and 
Li distances. For simplicity, we note that the role of P T V _1 P in quasi-score 
functions is similar to Fisher's information in ordinary likelihood functions, 
and therefore I(/3) is used below to denote P T V _1 P. 

Theorem 2.3. For Y satisfying Assumptions 2.1 and 2.2, we have 

U(/3) ~ JV ( 0, ^I(/3) )+ P (-^=) asN^oo. 



N V N J \JN 



SPATIAL DATA AND ESTIMATING EQUATIONS 



7 



Next we derive the limiting distribution of the QL estimate. 

Notation, (a) We use 0(N P ) to denote a matrix A satisfying lim sup^^^ -^(A)^ < 
K for all i and j and for some constant K. (b) abs(^4) represents a matrix 
whose (ij)th element equals the absolute value of the (ij)th element of A. 
(c) and denote partial derivatives with respect to j3j and f3, respec- 
tively. 

Assumption 2.3. The matrix -^D / gU(/3) is negative definite at the true 
parameter /3 with probability going to 1 as N — > oo. 

Assumption 2.3 is a common assumption used for the derivative of score 
functions (e.g., [19]). 

Lemma 2.3. The QL estimate (3 is consistent. 

Proof. It is easy to see that 

DpUOS) = (D /3 P T )V" 1 (Y - 0) + P T (D^V- 1 )(Y - 6) - I(J3). 

Let (iio, • • • ,ijvo) T = 1 represent the first column of the design matrix and 
let £j = — 6i), . . . , tATj0jv(l — 6n)), i = 0, . . . ,u, represent the (i + l)st 

row of P T . Then D^fc = (t u t y 0i (1 - 9 X ) (1 - 20i ) , . . . , t Ni t Nj 9 N (1-6 N )(1- 
2#7v)) for j = 0, . . . ,u. Since the tu are binary and |1 — 20j| < 1, it follows 
that £j ^ &bs(Dp £A and thus P T ^ abs(D i gP T ). Similarly, we can show 

that 5T 1/2 ^ abs^D^.S^ 1 / 2 ), where £ = diag(0j(l - 0*), and thus 2V" 1 y 
abs(D / gV" 1 ). 

Therefore, an argument similar to the proof of Theorem 2.2 gives that 
(D^gP^V" 1 and P T (D | gV _1 ) are O(l). By Theorem 2.1, a normal variable 
Z* exists for any bounded vector a such that -^a T (Y — 6) = Z* + Op (-7=) 
as N — > 00. It then follows from the Cramer- Wold device and Theorem 2.1 
that 

(D^P T )V" 1 (Y - 0) + P T (D^V- 1 )(Y - 6) = P (v / A) 
and therefore 

(2.6) D ja U(/3) = Op(v / ]V)-I(/3) as N ^00. 

As a result, -^D / aU( / 9)| j g =j g — ► — -^I(/3 ) with probability going to 1 as N —> 

00. From Assumption 2.4, therefore jtI(/3 ) has a positive-definite limit. 

It follows next by the inverse theorem [3] that an open ball B(j3 ,r) 
exists such that -^U(/3) is one-to-one on the ball with probability going to 

1. Also, j[\J(B(f3 ,r)) contains an open ball J3(^U(/3 ),r*) for some r . 
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By Theorem 2.3, ^U(/3 ) -> E(XJ(P )) = in probability. Hence for this 
r* , || -^U(/3 ) ~~ 1| < r* with probability going to 1, where || • || denotes the 
Euclidean norm. This then implies that G B(^U(/3 ), r*) C ^U(5(/3 , r)) 
with probability going to 1. Since iU(/3) is one-to-one on B((3 ,r) and r 

can be arbitrarily small, -^U(/3) = a.e. implies that P — > P in probability. 

□ 

Theorem 2.4. For Y satisfying Assumptions 2.1-2.3, the QL estimate 
(3 has the limiting distribution 

y/N0 - /3 ) ~ iV(0, iVI" 1 ^)) + Op (4^) m iV ^ oo. 

Proof. The first-order Taylor series expansion gives 

(2.7) U03) = U(/3 ) + D^U(/3) 1^0 - /3 ) + o p (||£ - /3 ||). 

Assume that the inverse of D / gU(/3) exists at j3 . Then (2.7) implies that 
- f3 )(l + o P (l)) = -(D /3 U(P))- 1 \ ( 3 =l3o U(P ) because TJ0) = 0. In ad- 
dition, it follows from (2.6) that 

(Vpxj^r 1 = -(Op(Vn) - my 1 

= -(ICS))- 1 + (i(p)r 2 ■ o p (Vn) + O (-^J . 

Thus (P — Pq)(1 + o(l)) can be written as 

(IOSo))- 1 !!^) + (I(/3 ))- 2 O P (-^)u(/3 ) +o(l)u(/3 ). 

Because I(/3 ) = O(iV) and U(/3 ) = O p (VN) from Theorems 2.1 and 2.2, 
it follows that (p - /3 )(1 + o P (l)) = (I^q))-^^) + Op(jf). Since p is 
consistent, this implies that /3 — P = (I(/3 )) _1 U(/3 ) + Op(^) as N — > oo. 
The theorem then follows immediately from Theorem 2.3. □ 

The average rate of convergence for P to /3 can be computed as follows. 

Corollary 2.1. E(fi - P ) = O(^) as N -> oo. 

Proof. This result follows immediately from Theorem 2.4. □ 

It follows from Theorem 2.4 that E0) -> /3 and cov(/3) ->• I~ 1 (P ) as the 
sample size increases. In the terminology of Godambe [10], it can be shown 
that the QL estimating equation is an asymptotically unbiased optimal es- 
timating equation [19]. 
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3. Application of quasi-likelihood functions. In this section we illustrate 
the use of the previously developed theory to data cited by Fingleton [9]. 
These well-known data [8, 9] were collected from Lansing Woods to examine 
whether the presence of hickory near a sample site tends to discourage the 
presence of maple. 

For notational convenience, let X(s) and Y(s) denote the corresponding 
indicator variables for the presence of hickory and maple at site s, respec- 
tively. We define the leftmost of a horizontal pair as the "first" and the 
uppermost of a vertical pair as the "first" as Fingleton did in his 1986 pa- 
per. One approach for testing independence between two correlated binary 
variables is to model these two variables by a conditional logistic model, 

(3.1) Bi = P(Y(s t ) = l\X(s t ) = Xl ) = ^^0+^) i = 1 256 _ 

1 + exp(/3 + PiXi) 

That is, at a specific site Sj, given the information whether hickory is present 
or not, the model indicates how likely it is that maple is present. Thus, if the 
estimate of (3\ in (3.1) is not significantly different from zero, we may say that 
X and Y are independent. The concept of this model is not very far from 
the traditional chi-squared approach. In fact, when no spatial dependence 
exists among observations, this approach is asymptotically equivalent to a 
chi-squared test in the analysis of contingency tables [1]. 

In geostatistics the elements of the matrix of correlations between sites 
are usually obtained from semivariogram models parameterized by con- 
stants denoting the nugget effect, the sill and the range [7]. For the maple 
data, we fit the exponential semivariogram model depicted in Figure 1. 
The correlation of maples at sites Sj and Sj can therefore be estimated 
by p y (si,Sj) = exp(— <i(sj,Sj)/1.091), where 1.091 comes from the "effective" 
range of the fitted exponential variogram model and d(sj,Sj) is the L2 dis- 
tance between sites Sj and Sj. 

Since the quasi-score function is nonlinear, the Newton-Raphson method, 
[}. +x = + (P^.Vj 1 P^)~ 1 Pj-Vj' 1 (y - 6j), was used to derive a numer- 
ical solution. An approximation of (3 obtained by iteration is (Po,$i) = 
(-0.34,-0.26) and var(/3 ) = 0.138, var(/3i) =0.001 and cov(/3 , $±) = -0.003. 

According to Theorem 2.4, we know that (3 is asymptotically normal. So, 
we can construct an approximate chi-squared test for the hypothesis that 
the parameter of independence j3\ is zero by using f3f/a 2 (Pi) = 6.76. This 
value is between those of the traditional chi-squared test (25.7) and Fingle- 
ton's deflated chi-squared value of 2.77 [9]. Therefore we may say that our 
approach provides a balanced point of view between the conservative tradi- 
tional chi-squared test and the liberal Fingleton deflated chi-squared test. 
In addition, the negative (3\ (—0.26) with a significant p-value (0.007 com- 
paring to the Xi table) shows strong evidence that the presence of hickory 
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discourages the presence of maple. Note that neither the traditional chi- 
squared test nor Fingleton's method could tell us whether the interaction 
between maple and hickory is positive or negative. 

4. Discussion. Analysis of non-Gaussian spatially correlated data is usu- 
ally difficult because of the complexity of the associated distributional forms. 
In this paper we have extended quasi-likelihood estimating equations to deal 
with spatially correlated data when the response variables are binary. We 
model the marginal response probability by a logistic regression; one benefit 
of this is that we can avoid the specification of likelihood functions. To em- 
ploy the proposed method we only need to know the means and covariances 
between observations. If correlations between observations are decreasing 
exponentially with distance (a reasonable assumption in many spatial set- 
tings), the QL estimates are shown to be asymptotically normal and consis- 
tent. This allows us to estimate and test fixed effects for data with correlation 
between sites. 

Although the QL approach discussed in this paper was originally moti- 
vated by a question in spatial statistics, the proposed approach is potentially 
applicable in other settings. In previous literature, estimating equation ap- 
proaches developed for dependent observations have mostly focused on re- 
peated measurements with block-diagonal covariance matrices, that is, with 
independence between subjects. The QL approach developed in this paper 
generalizes such methods; in this paper we show that, under good control 
on correlations, the asymptotics of QL estimates can hold for non-block- 
diagonal covariance matrices. This opens the possibility of applying QL ap- 




5 10 15 20 25 30 



Distance 

Fig. 1. Empirical semivariogram for the maple data. The solid line indicates the fitted 
variogram model with absolute sill 0.246 and range 1.091. 
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proaches to other situations where the data do not exhibit independence 
between subjects. 

We briefly outline a number of extensions of this work which we are 
currently pursuing. First, the proposed approach developed in this paper 
is for complete data. The forest ecology problem referred to in Section 1 
is not presented in this paper partly because observations are missing from 
several of the sites. Interestingly, the problem of missing data is related 
to the problem of analyzing data on an irregular lattice, another issue of 
note. In addition, the forest ecology data show possible large-scale spatial 
trends among observations. Conceivably this could be addressed by including 
appropriate covariates in the modeling work, although it remains to be seen 
how this would influence the asymptotics. On the other hand, as suggested 
by an anonymous referee, another possible solution for this problem is to fit 
two estimating equations simultaneously, one for the mean structure and the 
other for correlation structure. This approach was taken by Zeger [26] and 
McShane, Albert and Palmatier [21] for time series data by conditioning on 
a latent process. 

Our work in this paper has been restricted to exponential variograms, 
although the flexibility exists to allow different L p metrics. Nonetheless, it 
would be interesting to extend the methods of this paper to other variogram 
models that frequently occur in the analysis of spatial data. Moreover, the 
proposed method in this paper could be generalized to spatially correlated 
count data. However, this would involve reconstructing the covariance struc- 
ture, which itself would be a considerable task. We leave this to future work. 



Here we list some important results used in the proof of Theorem 2.1. Z 
is used to denote Y — 0. 

Lemma A.l. For centered binary variables Zi = Y% — Q%, we have 



E\Z\Zf ■ ■ ■ Z Pk ] = Var(Z 1 ) J E[Zf • • • Z p k k ] + (1 - 29 1 )E[Z 1 Z^ ■ ■ ■ Z p k k }. 
Proof. The expectation E\Z\Zff ■ ■ ■ Z Pk ] can be expanded to 



APPENDIX 



E[ZlZf ■ ■ ■ Zf ] 

= 0\E\Zf..-Zf\ 





{z2,...,Zk) 



X P[Z\ = 1 - 01, Z 2 = Z2, ■ ■ ■ , Z k = z k ). 
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The fc-fold summation in the second line can be converted to E(Z\Z^ ■ ■ ■ Z^ k ) + 
9\E(Z2 2 ■ ■ ■ Z^ k ). Inserting this result back into (A.l) gives the desired re- 
sult. □ 

Lemma A.l focuses on the "dependent" item in the expectation. When 
Z 1 is independent of (Z 2 , ...,Z k ), E(Z\Zf ■ ■ ■ Zf ) = var(Zi)£(Zf 2 • • • Z p k k ), 
which is exactly the same form of Lemma A.l when 0\ = 0.5. Thus (1 — 
2Q\)E{Z\Z^ ■ ■ ■ Z^ k ) of Lemma A.l can be considered to be the impact of 
dependence on the expected value, and we can expect that this impact is 
reduced for 9\ close to 0.5. In fact, E^ZxZ^Z^ = if Z\,Z 2 and Zj, are from 
a truncated Gaussian random field. 

Lemma A. 2. For variables Z satisfying Assumption 2.1 with the L\ 
metric, 

N N 2 - 2 

£$>v(Z(s n ),Z(s 42 )) < jf^aN. 

h&2 { P> 

Proof. The sum of correlations between a given site and all other 
sites over a region has a maximum value when is the center. So J2ii=i cov(A(sj 1 ) , X(sj 2 )) < 
E Sl &i m /«[ w ]L m /2j+i,[w]K2j+i),s l2 ) ) where Um denotes the upper right quar- 
ter of the region. This holds because var(Z(s)) < | and the sum of correla- 
tions in each quarter is the same. This lemma then follows from 

m/2 n/2 „ 2 

^ pd(([W]LW2J+l.[«]LW2J+l),* a ) = -1 + < p ~ p 

s io eiZm u=ot=o (i-p) □ 



Lemma A. 3. For variables Z satisfying Assumption 2.1 with the L 2 
metric, 



N N , / 1 \ 2 

2p 7T / 1 



J2J2cov(Z(s il ),Z(s t2 ))<( T ^ + - 



p 2 \log(p) 



aN. 



Proof. The sum of correlations between the center of the study region 
and all sites in 1Z m is — 1 + Z)"=o ^"=0 p^ t2+u2 , which can be shown to be 
less than + f (i^^y ) 2 by an integral test. An argument similar to Lemma 
A. 2 gives the desired result. □ 

Lemmas A. 2 and A. 3 are used to control correlations. Other details for 
controlling higher-order correlations are shown in [16] and omitted here. 

Let (fi(s) and (f(s) denote the corresponding characteristic functions of 
-^=a'Z and a normal random variable with mean zero and variance -^-a'Va, 
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respectively. Our approach to show Theorem 2.1 is to prove that </>(s) = 
<f(s) + 0(^=). Let <p*(s) denote the iVth truncated Taylor series of <p(s). 
Under Assumptions 2.1 and 2.2, we can show that the imaginary part of 
(f)*(s) is (9(-^=) and the real part 4>*(s) has an approximation 



1 



(A.2) 



imi s 2q 

1+ ^ q\2i M 
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E^i 



Vi 



,6 



J 9-j^ l q-j+i>%-j+2 



=>*<2+J-l> l i?+.7 



with error 0(^=), where 



af0;(l-0;) and ^ = ^^[6*1^(1 - ^)^(1 - %)ap d ( s » s ^. 



The proof requires numerous steps involving combinatorial analysis. Readers 
interested in the process can find the details in [16]. 
In addition, it can be shown that 



MO -**(«)!< 



(N + iy. 



Emm[ \s 



a'Z 



N+l 



,2(N + 1] 



a'Z 



N 



)=o(s N ). 



The A^th truncated Taylor series of (f(s), denoted by <p*(s), can also be 
shown to have an approximation (A.2) with error O(jj). The finite value 
of -^a'Va from Lemma A.l or A.2 (depending on the metric employed) 
then implies \tp(s) — = o(s N ). Theorem 2.1 is then obvious from these 

results. 
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